A UN Datathon Story
EBS, Monash
EBS, Monash
Education, Melbourne
Maths, QUT
Maths & Stats, USyd
February 29, 2024
Create a data solution
that tackles one or more of the 17 sustainable development goals
by leveraging one of the six key transitions
and focuses on the SDG localisation enabler
Globally, nearly a billion people lack reliable energy sources, and solar is a cost-effective way for this demand to be fulfilled.
Map areas of the globe that solar farm investment would be successful in, by using existing solar farms as training data; overlay that onto a map of energy demand, proxied by night light data.
| Quantity | Source | Provided/Extracted Format |
|---|---|---|
| Population density | Google Earth Engine, provided by Oak Ridge National Laboratory | tiff |
| Night light intensity | NASA, Earth at Night project | tiff |
| Biomass/land use | NASA | tiff |
| Terrain slope | Google Earth Engine, provided by USGS | tiff |
| Photovoltaic potential | Global Solar Atlas | tiff |
| Solar farm locations | S. Dunnett, hosted on awesome-gee-community-catalog and figshare | csv |
Data was all remapped from their raw forms onto a consistent grid.
rasterGrid = raster(ncols = 3600, nrows = 1800,
xmn = -180, xmx = 180,
ymn = -90, ymx = 90)
baseRaster = terra::rast(rasterGrid)
rawValues = terra::rast(tiffFile)
consistentValues = resample(rawValues, baseRaster, method = "bilinear")
valueDataFrame = as.data.frame(consistentValues, xy = TRUE, na.rm = FALSE) %>%
mutate(id = 1:ncell(consistentValues))Regress per-area power production of existing solar farm locations on a laughably small number of factors (photovoltaic potential, land use, terrain slope).
Using “spatial” “random forest”.
Demand was modelled using a proxy quantity constructed from night light intensity and population density
So none of us had much experience with spatial data.
Most of the day was spent collecting and sourcing data.
Initial focus was on Africa, but we couldn’t find nice shape files or very local data for the region.
Limitations:
We wanted to find spatial data at a resolution that was better than at country level, and had data for the entire globe.
Work was done on an AWS EC2 VM instance that had RStudio Server installed.